AITopics | order constrained optimization

Multi-Agent First Order Constrained Optimization in Policy Space

Neural Information Processing SystemsDec-26-2025, 05:06:03 GMT

In the realm of multi-agent reinforcement learning (MARL), achieving high performance is crucial for a successful multi-agent system.Meanwhile, the ability to avoid unsafe actions is becoming an urgent and imperative problem to solve for real-life applications. Whereas, it is still challenging to develop a safety-aware method for multi-agent systems in MARL. In this work, we introduce a novel approach called Multi-Agent First Order Constrained Optimization in Policy Space (MAFOCOPS), which effectively addresses the dual objectives of attaining satisfactory performance and enforcing safety constraints. Using data generated from the current policy, MAFOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. Then, the update policy is projected back into the parametric policy space to achieve a feasible policy. Notably, our method is first-order in nature, ensuring the ease of implementation, and exhibits an approximate upper bound on the worst-case constraint violation. Empirical results show that our approach achieves remarkable performance while satisfying safe constraints on several safe MARL benchmarks.

name change, order constrained optimization, policy space, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

First Order Constrained Optimization in Policy Space

Neural Information Processing SystemsDec-24-2025, 11:01:31 GMT

In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior--such as ones which are deemed unsafe and to be avoided--are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space. Our approach has an approximate upper bound for worst-case constraint violation throughout training and is first-order in nature therefore simple to implement. We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.

name change, order constrained optimization, policy space, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Review for NeurIPS paper: First Order Constrained Optimization in Policy Space

Neural Information Processing SystemsJan-27-2025, 15:26:04 GMT

Strengths: The idea behind this work is very clear. Theorem 1 can be easily derived using duality, but the difficulty lies in how to solve it. The paper is creative in its simplification on the update of dual varaibles lambda and mu. Without this, the algorithm would be pretty much a primal-dual gradient method, which requires the primal part to be evaluated accurately enough before the update of dual variables, which is difficult in practice. For lambda, the paper takes advantage of its similarity to the temperature term in maximum entropy RL and thus fixes it during training.

experiment, neurips paper, order constrained optimization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Review for NeurIPS paper: First Order Constrained Optimization in Policy Space

Neural Information Processing SystemsJan-27-2025, 15:25:57 GMT

The reviewers were unequivocally positive about this paper. They had some minor concerns that the authors address in their response. I encourage the authors to incorporate this into any final version.

neurips paper, order constrained optimization, policy space

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Multi-Agent First Order Constrained Optimization in Policy Space

Neural Information Processing SystemsJan-19-2025, 09:58:31 GMT

In the realm of multi-agent reinforcement learning (MARL), achieving high performance is crucial for a successful multi-agent system.Meanwhile, the ability to avoid unsafe actions is becoming an urgent and imperative problem to solve for real-life applications. Whereas, it is still challenging to develop a safety-aware method for multi-agent systems in MARL. In this work, we introduce a novel approach called Multi-Agent First Order Constrained Optimization in Policy Space (MAFOCOPS), which effectively addresses the dual objectives of attaining satisfactory performance and enforcing safety constraints. Using data generated from the current policy, MAFOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. Then, the update policy is projected back into the parametric policy space to achieve a feasible policy.

multi-agent system, order constrained optimization, policy space, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

First Order Constrained Optimization in Policy Space

Neural Information Processing SystemsOct-11-2024, 02:58:19 GMT

In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior--such as ones which are deemed unsafe and to be avoided--are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space.

constraint, order constrained optimization, policy space

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Filters

Collaborating Authors

order constrained optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Multi-Agent First Order Constrained Optimization in Policy Space

First Order Constrained Optimization in Policy Space

Review for NeurIPS paper: First Order Constrained Optimization in Policy Space

Review for NeurIPS paper: First Order Constrained Optimization in Policy Space

Multi-Agent First Order Constrained Optimization in Policy Space

First Order Constrained Optimization in Policy Space